A new dataset called GPQA presents a real challenge with 448 tough multiple-choice questions in biology, physics, and chemistry. Even domain experts struggle, scoring around 65% accuracy, while non-experts only manage 34%. Advanced AI systems like GPT-4 achieve just 39% accuracy. This dataset is aimed at developing methods for supervising AI outputs in complex scientific questions.
Tuesday, March 5, 2024Researchers have developed a new technique called Resonance RoPE to help LLMs better understand and generate text in longer sequences than they were originally trained on. This method, which improves on the existing Rotary Position Embedding (RoPE) system, enhances model performance on long texts without extra computing effort.
The All-Seeing Project V2 introduces the ASMv2 model, which blends text generation, object localization, and understanding the connections between objects in images.
SURE is a novel method that combines various techniques to improve the reliability of uncertainty predictions in deep neural networks, especially for image classification tasks.
The effectiveness of large language models is primarily influenced by the quality of their training data. Projections suggest that high-quality data will be scarce by 2027. Synthetic data generation emerges as a promising solution to this challenge, potentially reshaping internet business models and highlighting the importance of equitable data access and antitrust considerations.
Researchers used synthetic data to generate 2 million path problems. They then trained a 7B model and found strong performance against state-of-the-art large language models.
Taking images of a design and outputting code is a challenging task. This work proposes a benchmark, 18B model, and evaluations to suggest that we are close to being able to perform this on simple designs. In some cases, GPT-4V-generated code is preferred over human-synthesized code.
The KEPP system introduces a new approach to planning and executing complex tasks. The method, which leverages a probabilistic knowledge graph, allows the model to logically sequence actions towards achieving a goal.
The Yi models have long been among the most powerful open language models. The team has released a paper that contains substantial insights into their data collection and training processes.
This paper introduces metaheuristics, a diverse set of over 100 discrete optimization methods, as a powerful tool for improving prompt learning in large language models.
This work shows that you can train models individually and then merge them together into a single Mixture-of-Experts model.
Evaluating language models trained to code is a challenging task. Most folks use HumanEval from OpenAI. However, some open models seem to overfit to this benchmark. LiveCodeBench is a way to measure coding performance while mitigating contamination concerns.
Next token prediction is a simple objective that leads to complex behaviors. This work found that a single self attention layer trained with gradient descent broke the problem down into hard retrieval and soft composition, which enabled in-context learning and strong overall performance.
YOLOX-ViT introduces a new approach to object detection in underwater robotics by integrating visual transformers and knowledge distillation.
A weird fact of modern language modeling is that we train a tokenizer first before training the model. The second weird fact is that vocab size doesn't seem to matter too much at large scales.
OpenAI published research on giving system prompts stronger weighting, which dramatically improves model robustness to jailbreaks and adversarial attacks.
Most modern AI is built around the idea of compressing a training dataset into a model. The better the compression, the better the model. This paper shows that relation rigorously and posits that scale benchmark scores correlate strongly to a model's ability to compress novel text.
Wednesday, April 17, 2024TransformerFAM provides a feedback mechanism that allows Transformers to attend to their own latent representations. This can, in theory, introduce recurrence into the model for processing extremely long inputs in context.
Another long context paper - this time, a new architecture that uses two novel weight updating schemes. It outperforms Llama 2 on the same number of training tokens 2T. It also scales to infinite context length at inference time.
Vision Language Models (vLLMs) often struggle with processing multiple queries per image and identifying when objects are absent. This study introduces a new query format to tackle these issues, and incorporates semantic segmentation into the training process.
Mathematicians and Google's DeepMind researchers have utilized AI to find large collections of objects that lack specific patterns, assisting in understanding potential catastrophic failures like internet severing due to server outages. Their approach employs large language models to iteratively generate and refine set-free collections, facilitating the study of worst-case scenarios. This research reflects the combined power of AI and human ingenuity in tackling complex problems.
Researchers have developed a new method called Federated Proxy Fine-Tuning (FedPFT) that improves the adaptation of foundation models for specific tasks while preserving data privacy.
This paper introduces a new approach to enhancing In-Context Learning (ICL) in large language models like Llama-2 and GPT-J. Its authors present a new optimization method that refines what they call 'state vectors' — compressed representations of the model's knowledge.
Meta's LLaMA3, a leading large language model, is being tested for its efficiency in low-bit scenarios, often essential in systems with limited resources. This study, available on GitHub and Hugging Face, aims to refine and improve quantization strategies for future large language models.
Recent experiments introduced "Reasoning Tokens" to improve the thinking process of language models like GPT-2, encouraging them to make calculations for future tokens. Early results show a 35% decrease in loss, indicating the models can indeed learn to anticipate future information. This approach could enhance the ability of language models to plan and reason in a self-supervised manner, potentially reducing the need for step-by-step explanations.
Researchers have developed a novel approach called FFNification that transforms self-attention mechanisms into more efficient token mixers using only convolutions while keeping the query-key-value framework.
Researchers have revisited the use of ReLU activation functions in learning implicit neural representations (INRs). Inspired by second-order B-spline wavelets, they introduced simple constraints to ReLU neurons to counter spectral bias.
Super persuasion is the fear that as models grow larger, they will get substantially more persuasive. There is weak evidence to suggest that larger models aren't substantially more persuasive than smaller models. However, they may be able to be tuned to be more persuasive.
MacroHFT is a new approach to high-frequency trading (HFT) in cryptocurrency markets that leverages reinforcement learning to improve decision-making and profitability.
Researchers have improved QMIX, a popular method for multi-agent reinforcement learning, by adding a local Q-value learning method within a maximum entropy framework.